BUG: resampling with NaT in TimedeltaIndex (#13223) #14649

paulgliu · 2016-11-14T01:46:12Z

closes Resampling with NaT in TimedeltaIndex raises MemoryError #13223
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

codecov-io · 2016-11-14T02:32:08Z

Current coverage is 85.28% (diff: 100%)

Merging #14649 into master will increase coverage by <.01%

@@             master     #14649   diff @@
==========================================
  Files           140        140          
  Lines         50693      50698     +5   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43234      43240     +6   
+ Misses         7459       7458     -1   
  Partials          0          0

Powered by Codecov. Last update 1d6dbb4...d112e73

codecov-io · 2016-11-14T02:32:08Z

Codecov Report

Merging #14649 into master will increase coverage by 4.64%.
The diff coverage is 96.22%.

@@            Coverage Diff             @@
##           master   #14649      +/-   ##
==========================================
+ Coverage   86.33%   90.97%   +4.64%     
==========================================
  Files         139      145       +6     
  Lines       51149    49481    -1668     
==========================================
+ Hits        44157    45014     +857     
+ Misses       6992     4467    -2525

Flag	Coverage Δ
#multiple	`88.73% <96.02%> (?)`
#single	`40.67% <41.8%> (?)`

Impacted Files	Coverage Δ
pandas/stats/moments.py	`71.19% <ø> (ø)`	⬆️
pandas/io/html.py	`84.85% <ø> (+0.36%)`	⬆️
pandas/io/sas/sasreader.py	`85.18% <ø> (+77.77%)`	⬆️
pandas/core/config.py	`88.09% <ø> (ø)`	⬆️
pandas/io/feather_format.py	`86.66% <ø> (ø)`	⬆️
pandas/sparse/list.py	`97.1% <ø> (ø)`	⬆️
pandas/indexes/numeric.py	`97.1% <ø> (-0.05%)`	⬇️
pandas/parser.py	`100% <ø> (ø)`
pandas/indexes/range.py	`92.11% <ø> (+0.02%)`	⬆️
pandas/compat/numpy/__init__.py	`93.93% <ø> (ø)`	⬆️
... and 156 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c26e5bb...5372175. Read the comment docs.

jreback · 2016-11-15T11:35:09Z

pandas/tseries/resample.py

+            binner = binner.insert(0, tslib.NaT)
+            labels = labels.insert(0, tslib.NaT)
+
+            n_NaT = sum([ax_i is tslib.NaT for ax_i in ax])


n_NaT = ax._isnan.sum()

Will change. Thanks

jreback · 2016-11-15T11:37:48Z

pandas/tseries/resample.py

@@ -1204,6 +1206,13 @@ def _get_time_delta_bins(self, ax):
        end_stamps = labels + 1
        bins = ax.searchsorted(end_stamps, side='left')

+        if ax.hasnans:


though we actually need to ignore the nans, doesn't this actually create a NaT group? this is inconcistent with grouping.

NaT group will be ignored in aggregation. It is handled the same way as in function _get_time_bins.

add a comment here on what you are doing

jreback · 2016-11-15T11:38:18Z

pandas/tseries/tests/test_resample.py

@@ -970,6 +970,15 @@ def test_resample_timedelta_idempotency(self):
        expected = series
        assert_series_equal(result, expected)

+    def test_resample_timedelta_missing_values(self):
+        # GH 13223
+        index = pd.to_timedelta(['0s', pd.NaT, '2s'])


test with an all NaT index

jreback · 2016-12-26T21:45:14Z

pandas/tseries/resample.py

@@ -1189,13 +1190,18 @@ def _get_time_delta_bins(self, ax):
            raise TypeError('axis must be a TimedeltaIndex, but got '
                            'an instance of %r' % type(ax).__name__)

+        if len(ax) > 0 and all(ax._isnan):


I would give a better error message here. Is this consistent with how we handle all-nan grouping?

All-nan grouping doesn't seem to be handled elsewhere. Any suggestions?

we currently ignore all nan groups, that's why I think it is there, so this is consistent, maybe a comment is worth it here.

so this is all data is nan. Hmm, I would just give a better error message then.

use ax.hasnans

jreback · 2016-12-26T21:45:30Z

can you rebase.

jreback · 2017-01-02T16:06:14Z

pandas/tseries/tests/test_resample.py

+        # all NaT
+        index = pd.to_timedelta([pd.NaT, pd.NaT, pd.NaT])
+        series = pd.Series([2, 3, 5], index=index)
+        self.assertRaises(DataError, series.resample('1s').mean)


This is probably the right idea (e.g. raising a nice message),

e.g. we do this for all-nan groups (which is unfriendly)

In [8]: s = Series([1,2,3],[np.nan,np.nan,np.nan]) In [9]: s.groupby(s.index).sum() TypeError: unhashable type: 'Float64Index'

paulgliu · 2017-02-01T04:04:02Z

Rebased

jreback

looks pretty good. some doc fixups

pls add a whatsnew entry in bug fixes (0.20.0).

ping on green.

jreback · 2017-02-01T14:14:28Z

pandas/tseries/resample.py

@@ -1189,13 +1190,18 @@ def _get_time_delta_bins(self, ax):
            raise TypeError('axis must be a TimedeltaIndex, but got '
                            'an instance of %r' % type(ax).__name__)

+        if len(ax) > 0 and all(ax._isnan):


use ax.hasnans

jreback · 2017-02-01T14:15:39Z

pandas/tseries/resample.py

@@ -1204,6 +1206,13 @@ def _get_time_delta_bins(self, ax):
        end_stamps = labels + 1
        bins = ax.searchsorted(end_stamps, side='left')

+        if ax.hasnans:


add a comment here on what you are doing

jreback · 2017-02-01T14:16:32Z

pandas/tseries/resample.py

        if not len(ax):
            binner = labels = TimedeltaIndex(
                data=[], freq=self.freq, name=ax.name)
            return binner, [], labels

-        start = ax[0]
-        end = ax[-1]
+        # Addresses GH #13223


add a comment on what you are doing here (and why not just selecting start/end)

[ci skip]

In `io/parsers/_try_convert_dates()` when selecting columns based on a column index from a set of columns with multi- level names, the column `name` was converted to a string. This appears to be a bug since the `name` was a tuple before the conversion. This causes problems downstream when there is an attempt to use this name to lookup a column, and that lookup fails because the desired column is keyed from the tuple, not its string representation closes pandas-dev#15376 Author: Stephen Rauch <[email protected]> Closes pandas-dev#15378 from stephenrauch/fix_read_csv_merge_datetime and squashes the following commits: 030f5ec [Stephen Rauch] BUG: Parse two date columns broken in read_csv with multiple headers

closes pandas-dev#15426 Author: Stephen Rauch <[email protected]> Closes pandas-dev#15433 from stephenrauch/tz-lost-in-groupby-agg and squashes the following commits: 64a84ca [Stephen Rauch] BUG: GH15426 timezone lost in groupby-agg with cython functions

column names in python 2. closes pandas-dev#11879 closes pandas-dev#13462

in assert_frame_equal, if check_like, the former code reindex_like before shape comparison. for example: if left.shape=(2,2), right.shpae=(2.0), after reindex_like, left.shape=(2,0),right.shape=(2,0),then the shape comparison will not find out that the two dataframes are different. For that, the assert_frame_equal will not raise assertion errors. But in fact it should raise. Author: jojomdt <[email protected]> Closes pandas-dev#15496 from jojomdt/master and squashes the following commits: 7b3437b [jojomdt] fix test_frame_equal_message error 0340b5c [jojomdt] change check_like description c03e0af [jojomdt] add test for TestAssertFrameEqual 470dbaa [jojomdt] combine row and column shape comparison ce7bd74 [jojomdt] reindex_like after shape comparison

…_layout() (pandas-dev#15515) * Add unit test for pandas-dev#9351 * Tweaks. * add _check_plot_works; rm aux method * Add whatsnew entry.

closes pandas-dev#15347 Author: Jeff Reback <[email protected]> Closes pandas-dev#15484 from jreback/gbq and squashes the following commits: 0fd8d06 [Jeff Reback] wip 3222de1 [Jeff Reback] CLN: remove pandas/io/gbq.py and tests and replace with pandas-gbq

The transform() operation needs to return a like-indexed. To facilitate this, transform starts with a copy of the original series. Then, after the computation for each group, sets the appropriate elements of the copied series equal to the result. At that point is does a type comparison, and discovers that the timedelta is not cast- able to a datetime. closes pandas-dev#10972 Author: Jeff Reback <[email protected]> Author: Stephen Rauch <[email protected]> Closes pandas-dev#15430 from stephenrauch/group-by-transform-timedelta-from-datetime and squashes the following commits: c3b0dd0 [Jeff Reback] PEP fix 2f48549 [Jeff Reback] fixup slow transforms cc43503 [Stephen Rauch] BUG: GH15429 transform result of timedelta from datetime

…11444, pandas-dev#13046 make sure .size includes the name of the grouped

…-dev#15523) * DOC: Update contributing for test_fast, fix doc Windows build * add pip install for xdist

xref pandas-dev#15484

…s-dev#15879) DOC: remove vbench instructions from contributing.rst

* DOC: update contributing.rst for ci * typos & auto-cancel links * make it a note * add back accid deleted section

…s-dev#15883) closes pandas-dev#15608

When cleaning `na_values` during initialization of `TextFileReader`, we return a `list` whenever we specify that `na_values` should be empty. However, the rest of the code expects a `set`. Closes pandas-dev#15835. Author: gfyoung <[email protected]> Closes pandas-dev#15881 from gfyoung/keep-default-na-excel and squashes the following commits: 0bb6f64 [gfyoung] BUG: Patch handling no NA values in TextFileReader

closes pandas-dev#14800 Author: Jeff Reback <[email protected]> Closes pandas-dev#15541 from jreback/exceptions and squashes the following commits: e5fbdc8 [Jeff Reback] give nicer deprecation / message on infer_dtype moving ab4525b [Jeff Reback] typo on pandas.errors in whatsnew d636ef7 [Jeff Reback] document removed exceptions 3dc4b9a [Jeff Reback] more docs for exceptions 2bb1fbd [Jeff Reback] remove AmbiguousIndexError, completely unused 5754630 [Jeff Reback] fix doc-string 35d225f [Jeff Reback] more examples e91901d [Jeff Reback] DOC: better docs on infer_type 7e8432d [Jeff Reback] remove need for PandasError sub-class 92b2fdc [Jeff Reback] corrections 991fbb4 [Jeff Reback] API: expose pandas.errors eec40cd [Jeff Reback] add pandas.api.lib add infer_dtype to pandas.api.lib

xref pandas-dev#12640 xref pandas-dev#14876 Author: Aleksey Bilogur <[email protected]> Closes pandas-dev#15521 from ResidentMario/12640 and squashes the following commits: 1657246 [Aleksey Bilogur] two doc changes 28a38f2 [Aleksey Bilogur] tweak whatsnew entry. 5f306a9 [Aleksey Bilogur] +whatsnew ff895fe [Aleksey Bilogur] Add tests, update docs. 11f3fe4 [Aleksey Bilogur] rm stray debug. 3cbbed5 [Aleksey Bilogur] Melt docstring. d54dc2f [Aleksey Bilogur] +pd.DataFrame.melt.

… like closes pandas-dev#15869 Author: Jeff Reback <[email protected]> Closes pandas-dev#15892 from jreback/construct and squashes the following commits: 6bf2148 [Jeff Reback] fix perf 7fcd4e5 [Jeff Reback] BUG: Bug in DataFrame construction with nulls and datetimes in a list-like

* Citing source in README file For GH users who strictly or heavily use the web-view instead of a local Git, having a direct link is handy, as it does not require downloading the PDF _if_ the user wanted to go to the source of it directly. It's an alternative that allows those interested in more uploads similar to this PDF from the same author(s). * jorisvandenbossche's feedback I re-read the PDF and made sure the wording reflected the content presented. I also changed the source-citing so that is more friendly for .TXT files instead of Markdown or unspecified. * Update README.txt * English enhancement Improved sentence structure for English speakers.

)

xref pandas-dev#15299 Author: Jeff Reback <[email protected]> Closes pandas-dev#15902 from jreback/series_n and squashes the following commits: 657eac8 [Jeff Reback] TST: better testing of Series.nlargest/nsmallest

1) Allows for more uniform handling of invalid file buffers to our `read_*` functions. 2) Adds a ton of new documentation to `inference.py` Closes pandas-dev#15337. xref pandas-dev#15895. Author: gfyoung <[email protected]> Closes pandas-dev#15894 from gfyoung/validate-file-like and squashes the following commits: 5a8f8da [gfyoung] DOC: Document all of inference.py 81103f7 [gfyoung] ENH: Add file buffer validation to I/O ops

- [x] closes pandas-dev#14855 - [x] tests passed - [x] passes ``git diff upstream/master | flake8 --diff`` Author: alexandercbooth <[email protected]> This patch had conflicts when merged, resolved by Committer: Tom Augspurger <[email protected]> Closes pandas-dev#14871 from alexandercbooth/fix-color-scatterm-bug and squashes the following commits: 3245f09 [alexandercbooth] DOC: moving whatsnew entry to 0.20.0 8ff5f51 [alexandercbooth] BUG: addresses pandas-dev#14855 by fixing color kwarg conflict

[ci skip]

* DOC: Fix a typo in indexing.rst * more typos fixed

…15913) Closes pandas-devgh-15910.

closes pandas-dev#15297 Author: Roger Thomas <[email protected]> Closes pandas-dev#15299 from RogerThomas/fix_nsmallest_nlargest_with_n_identical_values and squashes the following commits: d3964f8 [Roger Thomas] Fix nsmallest/nlargest With Identical Values

xref pandas-dev#15865

* CLN: clean up select_n algos * CLN: clean ensure_data closes pandas-dev#15903 * return ndtype, so can eliminate special cases * unique * fixups

resample_bug_fix

paulgliu · 2017-04-07T03:15:32Z

rebased

jreback · 2017-04-07T12:22:17Z

when you rebase you should end up with a small number of commits.

try

git rebase -i yourbranchname origin/master
git push yourremote yourbranchname

jreback · 2017-05-07T14:12:39Z

closing. this needs a new PR with a cherry-pick on top of master.

jreback added Bug Resample resample method Timedelta Timedelta data type labels Nov 15, 2016

jreback reviewed Nov 15, 2016

View reviewed changes

paulgliu closed this Nov 18, 2016

paulgliu reopened this Nov 18, 2016

paulgliu closed this Nov 18, 2016

paulgliu reopened this Nov 18, 2016

jreback reviewed Dec 26, 2016

View reviewed changes

jreback reviewed Jan 2, 2017

View reviewed changes

paulgliu force-pushed the resample_bug_fix branch from d24693c to 6378e38 Compare February 1, 2017 03:48

jreback requested changes Feb 1, 2017

View reviewed changes

jorisvandenbossche and others added 12 commits February 25, 2017 22:38

DOC: fix doc build warnings (pandas-dev#15505)

303541e

DOC: Fix versionadded for cond in .where (pandas-dev#15509)

b3ae4c7

[ci skip]

BUG: Fix a bug occuring when using DataFrame.to_records with unicode

25dcff5

column names in python 2. closes pandas-dev#11879 closes pandas-dev#13462

TST: DataFrame.hist() does not get along with matplotlib.pyplot.tight…

fed1827

…_layout() (pandas-dev#15515) * Add unit test for pandas-dev#9351 * Tweaks. * add _check_plot_works; rm aux method * Add whatsnew entry.

BUG: fix groupby.aggregate resulting dtype coercion, xref pandas-dev#…

61fa8be

…11444, pandas-dev#13046 make sure .size includes the name of the grouped

DOC: Update contributing for test_fast, fix doc Windows build (pandas…

e0647ba

…-dev#15523) * DOC: Update contributing for test_fast, fix doc Windows build * add pip install for xdist

BUG: fix to_gbq calling convention; now its a bound method of DataFrame

edd2939

xref pandas-dev#15484

jreback and others added 26 commits April 3, 2017 13:25

DOC: remove gbq_integration instructions from contributing.rst (panda…

456e729

…s-dev#15879) DOC: remove vbench instructions from contributing.rst

DOC: update contributing.rst for ci (pandas-dev#15880)

ca7207f

* DOC: update contributing.rst for ci * typos & auto-cancel links * make it a note * add back accid deleted section

DOC: add section on how to use parametrize to contributing.rst (panda…

cd51bdd

…s-dev#15883) closes pandas-dev#15608

DOC: whatsnew cleaning

eedcc8f

DOC fixes in contributing.rst (pandas-dev#15887)

faf6401

DEPR: correct locations to access public tslib objects (pandas-dev#15897

0a37067

)

TST: better testing of Series.nlargest/nsmallest

dbc1654

xref pandas-dev#15299 Author: Jeff Reback <[email protected]> Closes pandas-dev#15902 from jreback/series_n and squashes the following commits: 657eac8 [Jeff Reback] TST: better testing of Series.nlargest/nsmallest

DOC: Fix a typo in travis.yml (pandas-dev#15915)

1fbdc23

Fix a docstring typo in _fill_mi_header (pandas-dev#15918)

b070d51

[ci skip]

DOC: Fix a typo in indexing.rst (pandas-dev#15916)

763197c

* DOC: Fix a typo in indexing.rst * more typos fixed

BUG: Standardize malformed row handling in Python engine (pandas-dev#…

a0b089e

…15913) Closes pandas-devgh-15910.

TST: skip decimal conversion tests on 32-bit (pandas-dev#15922)

4502e82

xref pandas-dev#15865

CLN: algos (pandas-dev#15929)

0cfc08c

* CLN: clean up select_n algos * CLN: clean ensure_data closes pandas-dev#15903 * return ndtype, so can eliminate special cases * unique * fixups

BUG: resampling with NaT in TimedeltaIndex (pandas-dev#13223)

078dbdf

BUG: resampling with NaT in TimedeltaIndex (pandas-dev#13223)

3c54c4b

fix pep8

ec144f3

better error message for all-nan groupings

30c749c

Merge branch 'resample_bug_fix' of github.com:liuguangyu0220/pandas into

5372175

resample_bug_fix

jreback closed this May 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: resampling with NaT in TimedeltaIndex (#13223) #14649

BUG: resampling with NaT in TimedeltaIndex (#13223) #14649

paulgliu commented Nov 14, 2016

codecov-io commented Nov 14, 2016

codecov-io commented Nov 14, 2016 •

edited by codecov bot

Loading

jreback Nov 15, 2016 •

edited

Loading

paulgliu Nov 17, 2016

jreback Nov 15, 2016

paulgliu Nov 17, 2016

jreback Feb 1, 2017

jreback Nov 15, 2016

paulgliu Nov 17, 2016

jreback Dec 26, 2016

paulgliu Jan 2, 2017

jreback Jan 2, 2017

jreback Jan 2, 2017

jreback Feb 1, 2017

jreback commented Dec 26, 2016

jreback Jan 2, 2017

paulgliu commented Feb 1, 2017

jreback left a comment

jreback Feb 1, 2017

jreback Feb 1, 2017

jreback Feb 1, 2017

paulgliu commented Apr 7, 2017

jreback commented Apr 7, 2017

jreback commented May 7, 2017

BUG: resampling with NaT in TimedeltaIndex (#13223) #14649

BUG: resampling with NaT in TimedeltaIndex (#13223) #14649

Conversation

paulgliu commented Nov 14, 2016

codecov-io commented Nov 14, 2016

Current coverage is 85.28% (diff: 100%)

codecov-io commented Nov 14, 2016 • edited by codecov bot Loading

Codecov Report

jreback Nov 15, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 26, 2016

Choose a reason for hiding this comment

paulgliu commented Feb 1, 2017

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paulgliu commented Apr 7, 2017

jreback commented Apr 7, 2017

jreback commented May 7, 2017

codecov-io commented Nov 14, 2016 •

edited by codecov bot

Loading

jreback Nov 15, 2016 •

edited

Loading